skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhang, Miao"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 1, 2026
  2. Greenspaces in communities are critical for mitigating effects of climate change and have important impacts on health. Today, the availability of satellite imagery data combined with deep learning methods allows for automated greenspace analysis at high resolution. We propose a novel green color augmentation for deep learning model training to better detect and delineate types of greenspace (trees, grass) with satellite imagery. Our method outperforms gold standard methods, which use vegetation indices, by 33.1% (accuracy) and 77.7% (intersection-over-union; IoU). The proposed augmentation technique also shows improvement over state-of-the-art deep learning-based methods by 13.4% (IoU) and 3.11% (accuracy) for greenspace segmentation. We apply the method to high-resolution (0.27m/pixel) satellite images covering Karachi, Pakistan and illuminates an important need; Karachi has 4.17m2of greenspace per capita, which significantly lags World Health Organization recommendations. Moreover, greenspaces in Karachi are often in areas of economic development (Pearson’s correlation coefficient shows a 0.352 correlation between greenspaces and roads,p< 0.001), and corresponds to higher land surface temperature in localized areas. Our greenspace analysis and how it relates to infrastructure and climate is relevant to urban planners, public health and government professionals, and ultimately the public, for improved allocation and development of greenspaces. 
    more » « less
    Free, publicly-accessible full text available February 8, 2026
  3. Satellite imagery is being leveraged for many societally critical tasks across climate, economics, and public health. Yet, because of heterogeneity in landscapes (e.g. how a road looks in different places), models can show disparate performance across geographic areas. Given the important potential of disparities in algorithmic systems used in societal contexts, here we consider the risk of urban-rural disparities in identification of land-cover features. This is via semantic segmentation (a common computer vision task in which image regions are labelled according to what is being shown) which uses pre-trained image representations generated via contrastive self-supervised learning. We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of a convolution neural network. The method improves feature identification by removing spurious latent representations which are disparately distributed across urban and rural areas, and is achieved in an unsupervised way by contrastive pre-training. The pre-trained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images. Embedding space evaluation and ablation studies further demonstrate FairDCL’s robustness. As generalizability and robustness in geographic imagery is a nascent topic, our work motivates researchers to consider metrics beyond average accuracy in such applications. 
    more » « less
  4. New data sources and AI methods for extracting information are increasingly abundant and relevant to decision-making across societal applications. A notable example is street view imagery, available in over 100 countries, and purported to inform built environment interventions (e.g., adding sidewalks) for community health outcomes. However, biases can arise when decision-making does not account for data robustness or relies on spurious correlations. To investigate this risk, we analyzed 2.02 million Google Street View (GSV) images alongside health, demographic, and socioeconomic data from New York City. Findings demonstrate robustness challenges; built environment characteristics inferred from GSV labels at the intracity level often do not align with ground truth. Moreover, as average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, intervention on features measured by GSV would be misestimated without proper model specification and consideration of this mediation mechanism. Using a causal framework accounting for these mediators, we determined that intervening by improving 10% of samples in the two lowest tertiles of physical inactivity would lead to a 4.17 (95% CI 3.84–4.55) or 17.2 (95% CI 14.4–21.3) times greater decrease in the prevalence of obesity or diabetes, respectively, compared to the same proportional intervention on the number of crosswalks by census tract. This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates. 
    more » « less
  5. Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
  6. Abstract BackgroundLow-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce to analyze population structure of low-depth sequencing data. ResultsThe method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common. ConclusionsWe apply to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The package is available onhttps://github.com/yiwenstat/MCPCA_PopGen. 
    more » « less
  7. Abstract The 2016–2017 central Italy seismic sequence occurred on an 80 km long normal-fault system. The sequence initiated with the Mw 6.0 Amatrice event on 24 August 2016, followed by the Mw 5.9 Visso event on 26 October and the Mw 6.5 Norcia event on 30 October. We analyze continuous data from a dense network of 139 seismic stations to build a high-precision catalog of ∼900,000 earthquakes spanning a 1 yr period, based on arrival times derived using a deep-neural-network-based picker. Our catalog contains an order of magnitude more events than the catalog routinely produced by the local earthquake monitoring agency. Aftershock activity reveals the geometry of complex fault structures activated during the earthquake sequence and provides additional insights into the potential factors controlling the development of the largest events. Activated fault structures in the northern and southern regions appear complementary to faults activated during the 1997 Colfiorito and 2009 L’Aquila sequences, suggesting that earthquake triggering primarily occurs on critically stressed faults. Delineated major fault zones are relatively thick compared to estimated earthquake location uncertainties, and a large number of kilometer-long faults and diffuse seismicity were activated during the sequence. These properties might be related to fault age, roughness, and the complexity of inherited structures. The rich details resolvable in this catalog will facilitate continued investigation of this energetic and well-recorded earthquake sequence. 
    more » « less